FUW TRENDS IN SCIENCE & TECHNOLOGY JOURNAL

(A Peer Review Journal)
e–ISSN: 2408–5162; p–ISSN: 2048–5170

FUW TRENDS IN SCIENCE & TECHNOLOGY JOURNAL

TRANSFER LEARNING ON CROSS LINGUAL NEWS CLASSIFICATION OF LOW RESOURCE YORUBA LANGUAGE USING BI-LSTM ON A SIAMESE NETWORK
Pages: 502-507
Abdullah, Khadijha-Kuburat Adebisi, 2 Sodimu Segun Michael , 3 Efuwape Biodun Tajudeen , 4 Olasupo Ahmed Olalekan.


keywords: BiLSTM, Cross-lingual, Low-resource, L2-regularization, Siamese network

Abstract

Most existing language models cannot handle low-resource textual data due to diversity in language representation and non-availability of text corpora. Transfer learning from high-resource help in such language, thus, disregard vocabulary overlap. Hence, cross-lingual news classification in Siamese network learn and build better model to encode sentences with few samples into shared embedding features from monolingual pretrained model using Bidirectional Long Short Term Memory (BiLSTM). The BiLSTM sequence model takes sentences of news articles for each language as input sequences, independently learns monolingual embeddings from parallel corpora using Skip-gram embeddings with negative sampling. Employed a lexicon to enhance the language model of low resource language and encodes into respective features representations. These embeddings are jointly aligned into a common cross-lingual features to capture semantic structure of the languages. The model is minimised with L2- regularization softmax cross-entropy loss )( L and enhanced with Adam optimizer. At end of 100 epochs, the result shows an accuracy of 0.84 with loss of 0.24 while precision, recall and F1-score are RCE 0.88, 0.92 and 0.89 respectively. The model confusion matrices increase as the epoch increases with decrease in loss function. The experiments show an aligned sentences task in two languages; English and Yoruba, also, embedding trained with pretrained sequence BiLSTM is improved with monolingual data.

References

Highlights